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Abstract 

A disordered medium is often constructed by N points independently and identically 
distributed in a d-dimensional hyperspace. Characteristics related to the statistics 
of this system is known as the random point problem. As d —* 00, the distances 
between two points become independent random variables, leading to its mean field 
description: the random link model. While the numerical treatment of large random 
point problems pose no major difficulty, the same is not true for large random link 
systems due to Euclidean restrictions. Exploring the deterministic nature of the 
congruential pseudo-random number generators, we present techniques which allow 
the consideration of models with memory consumption of order O(N), instead of 
0(N 2 ) in a naive implementation but with the same time dependence 0(N 2 ). 
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1 Introduction 



The random point problem (RPP) is a classical approach to construct dis- 
ordered (random) media. In this problem, N points are independently and 
identically distributed (i.i.d.) along the edges of a <i-dimensional hypercube. 
Due to boundary effects and triangular restrictions, the distances between 
any two points are not all independent random variables. Periodic boundary 
conditions are frequently used to minimize the boundary effect. As the sys- 
tem dimensionality increases, for fixed N boundary effects become more and 
more pronounced and the distances become less and less correlated when pe- 
riodic boundary conditions are used. As d — >• oo, all the two-point distances 
are i.i.d. random variables and this model is known as the random link (dis- 
tance) model [1] (RLM). In the RLM, there exist two Euclidean constraints: 

(i) the distance from a point to itself is always null (Da = 0, for all i) and 

(ii) the forward and backward distances are equal (Dij = Dji, for all i, j). If 
both Euclidean constraints are relaxed, this model becomes the random map 
model [2,3], which is the mean field approximation for Kauffman automata [4]. 

Both, the RPP and the RLM (RPP mean field description) have been very 
fruitful in the determination of numerical and analytical results in several 
interesting systems. These applications range from statistics on the optimal 
trajectories in the context of traveling salesman problem on a random set 
of cities [5,6,7,8,9] passing by frustrated dimerization optimization modeled 
by the minimum matching problem [10,11] (or equivalently spin-glasses [10]) 
to partial self-avoiding deterministic [12,13] and stochastic [14,15] walks. The 
high-dimensional case for these partial self-avoiding walks has been our main 
motivation to consider the RLM. In the deterministic walk [16,17,18,19], one 
is interested only on the neighborhood ranking of random points. Indeed, 
Euclidean distances are only a means to obtain this ranking, and this is 
independent of a particular distance probability distribution function (pdf) 
choices [13]. Here, we will consider only uniform distance pdf, nevertheless the 
algorithm can be easily adapted to treat any distance pdf, such as the one 
with the pseudo-dimension parameter [8]. 

In the simplest congruential implementation, a numerical random-number gen- 
erator is a deterministic algorithm initialized by a single integer variable Si, 
called seed. At each generator call, this initial seed Si is deterministically 
modified and gives rise to the uncorrelated sequence of integers S2, S3, . . ., 
uniformly distributed in the interval [0,2 m — 1[, for any m > 0. After run- 
ning all values in this interval, the seed reassume its initial value Si and the 
m-cycle reinitiates. For each integer seed value, the pseudo-random number 
generator commonly returns a real number, which is uniformly distributed in 
the interval [0, 1[, while keeping track of the seed value. 
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In Sec. 2, the RPP is considered and its numerical implementation is discussed. 
Next, in Sec. 3 we consider the RLM and we show that the straightforward 
implementation considering the distance matrix permits only small numer- 
ical systems simulations in a computer. Thus, we consider two alternative 
algorithms to implement the generation of random Euclidean distances by 
exploring the reproducibility of the pseudo-random number generators. Final 
remarks are addressed in Sec. 4 



2 Random Point Problem 

Consider a disordered medium made of N points embedded in a <i-dimensional 

Ik) 

Euclidean hyperspace. The coordinates x\ of these points are independent 
and randomly generated following a given common pdf pd(x) (for instance, 
uniform in a line segment of length L). The distance between any pair of 
points i and j is obtained by the Euclidean metrics: 

k=i 

A possible computational implementation of the RPP mainly consists of the 
following two steps: (i) randomly generate the coordinates x\ and store them 
in a iV x d matrix (coordinate matrix) and (ii) use Eq. 1 to calculate the dis- 
tance between any pair of points. If one wishes to compare distances among 
points, one must store the distance values in a N xN matrix (distance matrix) 
leading to 0(N 2 ) time consumption. The declaration of the distance matrix 
corresponds to large computer memory consumption. In numerical applica- 
tions, this limits system sizes (typically to iV = 720 in FORTRAN compilers 
and N = 15000 in C++). Using an alternative procedure, one can avoid the 
distance matrix declaration. This procedure consists to declare a vector of size 
N (mask) rather than the N x N distance matrix and calculate only the dis- 
tances related to a given point at each time step (for instance, to determine its 
nearest neighbors). Thus, the coordinate matrix, via the mask, saves us from 
the distance matrix declaration. The only computational waste is to calculate 
the same distance (Dij = Dji) twice (it could have been calculated only once 
if one had the distance matrix). Nevertheless, the time dependence is kept 
proportional to N 2 . 

To minimize the boundary effect, it is important to consider periodic boundary 
conditions and to keep fixed the mean point separation (£ = p x l d = L/N 1 ^, 
where L is a typical system size and p the point density), so that one has 
to increase N, as the system dimensionality increases. Since high-dimensional 
systems are to be considered, even the declaration of the coordinate matrix 
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may consume a lot of computer random access memory (RAM). This intro- 
duces additional computational difficulties once the system size has found a 
barrier imposed by the dimensionality. 

Nevertheless, due to periodic boundary conditions and to the system dimen- 
sionality increase, the correlations among the distances (triangular inequality, 
for example) are weakened so that in the high-dimensionality limit (d — > oo), 
the distances between any two points can be considered as N(N — l)/2 i.d.d. 
random variables. Thus, the RPP converges to the RLM. 



3 Random Link Model 

To work numerically with the RLM, one must generate directly the i.i.d. ran- 
dom distances. This solves the large computer RAM allocation problem due to 
high system dimensionality. Because of the coordinate matrix inexistence, it 
is impossible to use the mask as in the finite dimensional systems (RPP). But, 
keeping track of the symmetry restriction (D^ = Dji) imposes serious numer- 
ical difficulties so that the distance matrix must be declared. The symmetry 
restriction limits the RLM use to computational small systems. For standard 
memory allocation at disposal, systems can have up to iV = 10 3 points. 

3. 1 Conventional Implementation 

A straightforward implementation of the i.i.d. random distances in RLM is 
schematized as follows. All the values in distance matrix main diagonal are 
null and only one seed Si is used to sequentially generate the distances of the 
main diagonal right-hand side (the distances on the diagonal left-hand side are 
obtained from = Dij). The time dependence and memory consumption to 
run this computer algorithm are both proportional to N 2 . (See Table. 1) 

We present below two methods, which replace the distance matrix by a mask, 
just as in the RPP. To re-obtain the distance (which obeys the symmetry 
restriction), the deterministic feature of pseudo-random number generator is 
extensively explored. 

3.2 One-Seed Method 

The first method is called single seed method and reproduce any distance 
simply initializing the generator with the seed Si and calling the generator 
a given number of times. The only difficulty of this implementation is to 
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introduce and control a counter to keep track of the exact number of calls to 
the random-number generator. The number of needed new random variables 
is N — j for the jth row of the distance matrix. At each new row, this value 
must be added to the counter. 

This method enables us to numerically construct much larger systems, but at 
the expense of a much longer computational time. To generate all distances 
in a iV-point map, the memory consumption is of order N due to the vector 
allocation (See Table 1) while required time is proportional to iV 3 due to 
cumulative number of calls to the generator (See Figure 2). 



Method 


Conventional 


One-Seed 


Multiple-Seed 


Time 


N 2 


iV 3 


N 2 


Memory 


N 2 


N 


N 



Table 1 

Memory and time consumptions for the conventional, one seed and multiple seed 
methods described. The multiple-seed method is the best one since it combines the 
low O(N) memory consumption of the one-seed method and the low 0(iV 2 ) time 
spent of the conventional implementation method. 



3.3 Multiple-Seed Method 



An improvement to the single seed method is obtained by the multiple seed 
method. This method works noticing the following steps. Along the distance 
matrix first row, with the exception D X1 = 0, all other distances are new 
i.i.d. random variables and are generated simply making successive calls to 
the pseudo-random number generator. Along the following rows, the distances 
on the main diagonal right-hand side are also new (i.i.d.) random variables 
and are generated in the same way as before. Nevertheless, due to the model 
symmetry, the distances on the main diagonal left-hand side are not new 
random variables. Thus, the same seed S, which has been used to sequentially 
generate all the distances on the main diagonal right-hand side in the k-th row, 
must be used to sequentially generate all the distances below the main diagonal 
in the A;-th column (See Figure 1). For these requirements to be satisfied, the 
proposed method makes use of an integer vector of size N (seed vector) to 
store the seeds Si, SV, SW-2 . . ., to be used to generate the distances in each 
one of the N columns. Indeed, this vector maximum size is N — 1, but using 
N avoids undesired boundary tests. At every new row, these seeds are used to 
generate the distances on the left-hand side (and their new values - modified 
by the generator - are stored back in the seed vector) and the distances on 
the right-hand side are generated by simply making successive calls to the 
pseudo-random number generator (and only the first seed is added to the seed 
vector). 
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Fig. 1. Multiple seed method scheme that stresses the seed evolution and the way 
they are stored. 

To implement these ideas, consider building the first row of the distance ma- 
trix. Store the initial Si in the first entry of the seed vector (to use it further 
in the second row) and generate sequentially the N — 1 random distances in 
the first row, after that temporarily save the last seed Sn- To generate the 
distance matrix second row, first use the stored Si seed from the seed vector 
to re-obtain the distance D 2 \ = D 12 , and store further two seeds S 2 and Sn 
in the first two entries of the seed vector. Again, generate the following N — 2 
seeds and random distances in the second row saving the seed S 2 n-2- The 
matrix distance third row is generated by the same procedure. Use S2 and Sn 
to update the first two entries of the seed vector and add S2N-2 to the third 
entry. This procedure is repeated successively up to the (N — l)th row. 

Attention must be paid to a particular aspect. In more sophisticated random 
number generators, internal static variables must be passed as the routine ar- 
gument to obtain the desired reproducibility of the random-number generator. 
Even in this case, the memory and time consumption orders of magnitude are 
not altered. 



4 Conclusion 



Keeping track of the seeds along the construction of the all pair distances may 
drastically reduce computation time to order iV 2 just like the conventional 
implementation, but with memory consumption of order N, just as the one- 
seed method. In this way, the algorithm presented here is the best compromise 
between time and memory consumptions. 
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Fig. 2. Processing time as a function of system size. This dependence is well de- 
scribed by a power law with exponent 2 for the conventional and multiple-seed 
methods and exponent 3 for the one-seed method. While the conventional method 
can deal with systems of size around 10 3 , the multiple-seed method can deal with 
systems more than 100 greater. The discontinuity in the conventional method curve 
corresponds to the moment when swap to the disk started to be performed. 
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